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Abstract. We present several results on the smoothness in L p 
sense of filtering densities under the Lipschitz continuity assump- 
tion on the coefficients of a partially observable diffusion processes. 
We obtain them by rewriting in divergence form filtering equation 
' which are usually considered in terms of formally adjoint to oper- 

p I . ators in nondivergence form. 

-i— > 

a 

1. Introduction 

Let (Q, J 7 , P) be a complete probability space with an increasing 
nitration {T t -,t > 0} of complete with respect to (J 7 , P) cr-fields T t C 
T. Denote by V the predictable a-field in Q x (0, oo) associated with 
{Ft}- Let w k , k = 1,2,..., be independent one-dimensional Wiener 
processes with respect to {Tt}- 
CO We fix a stopping time r and for t < r in the Euclidean d- dimensional 

■ space JBL d of points x = (x 1 ,...,x d ) we are considering the following 

equation 

du t = (L t u t + Difi + /°) dt + (A*u t + g k t ) dw k t} (1.1) 

where u t = u t (x) = u t (io,x) is an unknown function, 

L t ip(x) = D j (4 j (x)D i tlj(x) + ai(x)ip(x)) + bl(x)Diip(x) + c t (x)ip(x), 

A k iP(x) = ai k {x)D^{x) + u k {x)tfj{x), 
the summation convention with respect to i,j = 1, d and k = 1, 2, ... 
is enforced and detailed assumptions on the coefficients and the free 
terms will be given later. 

One can rewrite (11.11) in the nondivergence form assuming that the 
coefficients a t J and a J t are differentiable in x and then one could apply 
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the results from [5]. It turns out that the differentiability of a t J and a\ 
is not needed for the corresponding counterparts of the results in [5] to 
be true and showing this and generalizing the corresponding results of 
[3] is one of the main purposes of Section [2] of the present article. We 
assume, roughly speaking, that a l t J (x) are measurable in t and of class 
VMO with respect to x. 

One of the main motivations for developing the theory of SPDEs 
comes from filtering theory of partially observable diffusion processes. This 
problem is stated as follows. Let d > 1, d\ > d be integers . 

Consider a <ii-dimensional two component process z t = (x t ,yt) with 
Xt being <i-dimensional and yt (d\ — d)-dimensional. We assume that 
Zt is a diffusion process defined as a solution of the system 

dx t = b(t, ztjdt + 6(t, z t )dw t , 
dy t = B(t,z t )dt + Q(t,y t )dw t 

with some initial data. 

The coefficients of ( ll.2p are assumed to be vector- or matrix-valued 
functions of appropriate dimensions defined on [0,T] x M. dl . Actually 
Q(t,y) is assumed to be independent of x, so that it is a function on 
[0,T] x M. dl ~ d rather than [0,T] x R dl but as always we may think of 
0(t, y) as a function of (t, z) as well. 

The component Xt is treated as unobservable and yt as the only ob- 
servations available. The problem is to find a way to compute the 
density vr t (x) of the conditional distribution of x t given y s ,s <t. Find- 
ing an equation satisfied by 7r t (filtering equation) is considered to be 
a solution of the (filtering) problem. The filtering equations turn out 
to be particular cases of SPDEs. 

In 1964 in [TJ] the filtering equations were proposed in a somewhat 
nonrigorous way and most likely some terms in these equations ap- 
peared from stochastic integrals written in the Stratonovich form and 
the others appeared from the Ito integrals. Perhaps, the author of [H] 
realized this too and published an attempt to rescue some results of [H] 
in 1967 in [To]. This attempt turned successful for simplified models 
without the so-called cross terms. 

Meanwhile, in 1966 in [20J the correct filtering equations in full gen- 
erality, yet assuming some regularity of the filtering density, were pre- 
sented. This is the reason we propose to call the filtering equations in 
the case of partially observable diffusion processes Shiryaev ? s equations 
and their particular case without cross terms Kushner's equations. 

In case d = 1 the result of |20j is presented in [17] on the basis 
of the famous Fujisaki-Kallianpur-Kunita theorem (see [2]) about the 
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filtering equations in a very general setting. Some authors even call the 
filtering equation for diffusion processes the Fujisaki-Kallianpur-Kunita 
equation. 

By adding to the Fujisaki-Kallianpur-Kunita theorem some simple 
facts from the theory of SPDEs, the a priori regularity assumption was 
removed in [9] and under the Lipschitz and uniform nondegeneracy 
assumption the /^-version of Theorem 13.21 was proved. The basic result 
of [9 J is that ir t G W\. It is also proved that if the coefficients are 
smoother, TTf(x) is smoother too. The nondegeneracy assumption was 
later removed (see [T9]) on the account of assuming that 88* is three 
times continuously differentiable in x. It is again proved that n t G W\ 
and 7Ti is even smoother if the coefficients are smoother. 

In [5] the results of [9] were improved, 88* is assumed to be twice 
continuously differentiable in x and it is shown that -n t G with any 
P > 2. 

The above mentioned results of [9], [19], and [5] use filtering theory 
in combination with the theory of SPDEs, the latter being stimulated 
by certain needs of filtering theory. It turns out that the theory of 
SPDEs alone can be used to obtain the above mentioned regularity 
results about rr t without knowing anything from filtering theory itself. 
It also can be used to solve other problems from filtering theory. 

The first "direct" (only using the theory of SPDEs) proof of reg- 
ularity of ir t is given in [TTJ in the case that system (11. 2p defines a 
nondegenerate diffusion process and 88* is twice continuously differ- 
entiable in x. It is proved that n t G with any p > 2 as in [5]. 
Advantages of having arbitrary p are seen from results like our Theo- 
rem 13.31 Of course, on the way of investigating ir t in (TTJ the filtering 
equations are derived "directly" in an absolutely different manner than 
before (on the basis of an idea from [TO]). 

In Section [3] of this article we relax the smoothness assumption in 
[TTj to the assumption that the coefficients of (11.21) are merely Lipschitz 
continuous, the assumption which is almost always supposed to hold 
when one deals with systems like (11.21) . We find that n t G Wp. Thus, 
under the weakest smoothness assumptions we obtain the best (in the 
author's opinion) regularity result on ir t . In particular, we prove that if 
the initial data is sufficiently regular, then the filtering density is almost 
Lipschitz continuous in x and 1/2 Holder continuous in t. However, we 
still assume z t to be nondegenerate. Our approach is heavily based 
on analytic results. There is also a probabilistic approach developed 
in [13] and based on explicit formulas for solutions introduced in [16] 
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and later developed in (TO] and p2] (also see references therein). This 
approach cannot give as sharp results as ours in our situation. 

It seems to the author that under the same assumptions of Lipschitz 
continuity, by following an idea from [I] one can solve another problem 
from filtering theory, the so-called innovation problem, and obtain the 
equality 



where Wt is the innovation Wiener process of the problem (its definition 
is reminded in Section [3]). Recall that for degenerate diffusion processes 
the positive solution of the innovation problem is obtained in [18] again 
on the basis of the theory of SPDEs under the assumption that the 
coefficients are more regular. 

By the way, in our situation, if the coefficients are more regular, the 
filtering equation can be rewritten in a nondivergence form and then 
additional smoothness of the filtering density, existence of which is 
already established in this article, is obtained on the basis of regularity 
results from [5]. 

Although for the proof of the above mentioned results concerning 
the filtering equations it suffices to use article [3] about SPDEs in 
divergence form with continuous coefficients, we prefer to give more 
general results borrowed from [7] in Section [2J In Section [3] we present 
some results about the filtering equations from [8]. 

We finish this section by introducing some notation. Let K, S > be 
fixed finite constants, p G [2, oo). Denote L p = L p (R d ), C °° = C^(R d ). 
Introduce 



By Du we mean the gradient with respect to x of a function u on W d . 
As usual, 

W p l = {u E L p : Due L p }, \\u\\ w i = \\u\\ Lp + \\Du\\ Lp . 

We use the same notation L p for vector- and matrix-valued or else 
£ 2 -valued functions such as g t = (g^) in (11 .11) . For instance, if u(x) = 
(u 1 (x) , u 2 (x) , ...) is an ^ 2 -valued measurable function on R d , then 



cr{y s , s <t} = <t{w s , s 



<t}, 




l,...,d. 




Recall that r is a stopping time and introduce 



Lp(r) :=L p ({0,tIV,L p ), Wj(r) := L p ({0, rj, V, W'). 



We also need the space WKt), which is the space of functions u t = 
u t {oJ,-) on {{oj,t) : < t < r,t < 00} with values in the space of 
generalized functions on M. d and having the following properties: 

(i) m G L p (tt, FotLp); 

(ii) u G Wj(r); 

(iii) There exist /* G L p (r), i = 0, ...,c?, and g = (gSg 2 , ■■■) G L p (r) 
such that for any <p G Cg 30 with probability 1 for all t G [0, 00) we have 

(«tAT, V 2 ) 



+ / I s<T {{flv)-UlDw))ds, (1.3) 

./Q 

where by (/, ip) we mean the action of a generalized function / on ip, 
in particular, if / is a locally summable, 

(/>V) = / f{x)(p{x)dx. 

Observe that, for any <fi G Cq 30 , the process (utAr, 4>) is .^-adapted and 
(a.s.) continuous. 

The reader can find in [5] a discussion of (ii) and (iii), in particular, 
the fact that the series in (11 .3p converges uniformly in probability on 
every finite subinterval of [0, r]. In case that property (iii) holds, we 
write 

du t =(DJi + f?)dt + g k dw k (1.4) 
for t < t and this explains the sense in which equation (11.11) is under- 
stood. Of course, we still need to specify appropriate assumptions on 
the coefficients and the free terms in (11.11) . 

The work was partially supported by NSF Grant DMS-0653121. 

2. SPDES IN DIVERGENCE FORM WITH VMO COEFFICIENTS 

We are considering (11.11) under the following assumptions. 

Assumption 2.1. (i) The coefficients a t 3 , a\, b % t , a l t k , Ct, and v\ are 
measurable with respect to V x B(R d ), where B(M. d ) is the Borel a- 
field on R d . 

(ii) For all values of indices and arguments 

K\ + \ b t\ + \ c t\ + W\e 2 < K, c t <0. 

(iii) For all values of the arguments and £ G M d 

ajw < «5 _1 ier, (4 j - <h j K ; c > m 2 , (2.1) 

where a t J = (1/2) {a 1 ', cr^')^. 
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It is worth emphasizing that we do not require the matrix (a 1 - 7 ) to be 
symmetric. Assumption 12.11 (i) guarantees that equation f 1 1.1 1) makes 
perfect sense if u G Wp(r). 

For functions h t (x) on [0, oo) x R d and balls B in M. d introduce 

h t (B) = j^j J h t (x) dx, 

where \B\ is the volume of B. If p > 0, set B p = {x : \x\ < p] and for 
locally integrable h t (x) and continuous R -valued function x r ,r > 0, 
introduce 

OSC p (h,X.) = SUp— / (|/i r - ^(B+^DfB+Xt.)^ 

where B = B p . Also for y G ffi d set 

Osc p (h,y)= sup sup osc r (/t, y + x.), 
N-|o<p r <p 

where |x.|c is the sup norm of \x.\. Observe that ocs £ h = if h t (x) is 
independent of x. 

Denote by j3 one third of the constant (3 (d,p,5) > from Lemma 

5.i of m. 

Assumption 2.2. There exist a constant e G (0, 1] such that for any 
y G Ri (and u) we have 

Osc e (a« j/)<A,, Vi,j. (2.2) 

Furthermore, 

(al k (x)-al k (y)We>m 2 
for all t, and x satisfying \x — y\ < e. 

Let Pi = Pi(d,p,S,s) > be the constant from Lemma 5.2 of [7]. 

Assumption 2.3. There exists a constant £x > such that for any 
t > we have 

W l t(x)-<(y)\h <fo, 

whenever x, y G Ml, |x — y| < £i, i = 1, d . 

Finally, we describe the space of initial data. Recall that for p > 2 
the Slobodetskii space Wp 2 ^ p = Wp 2//p (M d ) of functions u (x) can be 
introduced as the space of traces on t = of (deterministic) functions 
u such that 

u G L P (R + , W p l ), du/dt G L P (R + , H~ l ), 

where M + = (0, oo) and if" 1 = (1 — A) _1 / 2 L p . For such functions there 
is a (unique) modification denoted again u such that u t is a continuous 
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Lp-valued function on [0, oo) so that uq is well defined. Any such u t is 
called an extension of Uq. 

The norm in can be defined as the infimum of 

\W\\ Lp (R + ,W}) + W du / dt W Lp (K + ,H- 1 ) 

over all extensions Ut of elements Uq. 

Theorem 2.1. Let f j , g G L p (r) and let u G L p (tt, F Q , Wp~ 2/p ). Then 

(i) Equation (11. ip fort < T At has a unique solution u G Wp(T At) 
with initial data Uq for any constant T G (0, oo). 

(ii) There exists a set Q' C Q of full probability such that u t ^ T In' is 
a continuous Tt-adapted L p -valued functions of t G [0, oo). 

Assertion (ii) of Theorem 12.11 follows from assertion (i) and Theorem 

m 

Here is a result about continuous dependence of solutions on the 
data. 

Theorem 2.2. Assume that for each n = 1,2,... we are given functions 
a nt> a nv Kv °nt, <?% v h nt , f nt , g k nt , and u n0 having the same meaning 
and satisfying the same assumptions with the same 5, K, e, E\, (3q, and 
(3\ as the original ones. Assume that for i, j = l,...,d and almost all 
(u, t, x) we have 

{. a nti a nti b n f> C nt) ~^ { a i 1 a ti \i C t)i 
Wnt ~ a t\h + Wnt ~ V t \i 2 -> 0, 

as n — > oo. Also assume that 

d 

^(ll/n - / J |k(r) + \\9n ~ <?||l p (t) + ||«n0 ~ M\ LpM ,W^) ^ 
3=0 

as n — > oo. Let u n be the unique solutions of equations (II. ip for t < r 
constructed from a% tJ a l nt , b\ lt , c nt , a lk v v\ t , f nt , and g k nt and having 
initial values u u q. 

Then, for any T G [0, oo) as n — > oo, we have \\u n — w||wi(tat) - * 
and 

E SUp \\u nt ~ U t \\ P L -> 0. 

*<rAT 

In many situation the following maximum principle based on the 
results of [5] is useful. 

Theorem 2.3. Suppose that, for q G [2,p], Assumptions \2.2 \ and \2.^ 

are satisfied with (3 < f3 (d,q,5) and (3\ < /3i(d,q,5,e). Also suppose 
that u G L p (n,F , Wg~ 2/q ), q G [2,p\, u > 0, f = 0, i = l,...,d, 
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f° > 0, g = 0. Then for the solution u almost surely we have u t > 
for all finite t < r. 

Part of the proofs of the above results is based on the following Ito's 
formula. 

Theorem 2.4. Let u G W}(t), f j G h p (r), g = (g k ) G L p (r) and 
assume that (jl.4p /ioWs /or t < r in the sense of generalized functions. 
Then there is a set O' C O of full probability such that 

(i) UtA T In> is a continuous L p -valued Tt- adapted function on [0, oo)/ 

(ii) for all t G [0, oo) and uj G O' Ito's formula holds: 



(At 

u tAT \ p dx = I \uq\ p dx + p I I \u s \ p ~ 2 u s g k dx dvjg 



+ / (/ [p\u t r 2 u t f?-p(p-i)\u t r 2 fiD i u t 

JO JR d 

+ (l/2)p(p-l)\u t \ p - 2 \g t \l] dx) dt. (2.3) 
Furthermore, for any T G [0, oo) 

E sup \\u t \\l p <2E\\u \\l p + NT^\\fX [ p{r) 

t<rAT PK ' 

+ iVT^/ 2 (X: ||f f Lp(r) + |b||S, (T) + \\Du\\l [T) ), (2.4) 



i=l 



where N = N(d,p). 



We have a direct proof of this result. However, ( 12. 3D can also be 
obtained by extending some arguments from [T]. 



3. Filtering equations 

Fix a constant T G (0, oo) and for simplicity assume that w t in (11.21) 
is finite dimensional. First we state and discuss our assumptions. 

Assumption 3.1. The functions b, 8, B, and G are Borel measurable 
and bounded functions of their arguments. Each of them satisfies the 
Lipschitz condition in z with the constant K. 

Assumption 3.2. The process Zt is uniformly nondegenerate: for any 
\,z e R dl and t G [0, T] we have 

af(z)X i X j >5\X\ 2 , 

where 2a t {z) = 2(af (z)) = 6(t, z)6*{t, z) + G(t, y)Q*{t, y). 



Traditionally, Assumption 13. 21 is split into two following assumptions 
the combination of which is equivalent to Assumption 13.21 and in which 
some useful objects are introduced. These assumptions were also used 
in the past to reduce system (11. 2p to the so-called triangular form by 
replacing w t with a different Brownian motion. 

Assumption 3.3. The symmetric matrix GG* is invertible and 

^ : = (ee*)-i 

is a bounded function of (t, y). 

Assumption 3.4. For any £ £ M d , z = (x,y) G M. dl , and t > 0, we 
have 

\Q(t,y)6*(t lZ )Z\ 2 >6\Z\ 2 , 
where Q is the orthogonal projector on KerG. In other words, 

(9(1 - Q*q 2 Q)6*£, > 5|£| 2 . (3.1) 

Assumption 3.5. The random vectors xq and yo are independent of 
the process w t . The conditional distribution of xq given y has a density, 
which we denote by vr (x) = tt (uj,x). We have ir £ L p (fi, W^ 1 2 ^ p ). 

Next we introduce few more notation. Let 

y t = V(t,y t ), e t = e(t,y t ), a t (x) = -60*(t,x,y t ), b t (x) = b(t,x,y t ), 

a t (x) = 6(t, x, y t )e* t ^u Pt(x) = ®tB(t, x, y t ). 
In the remainder of the article we use the notation 

^ l dx l 

only for i = 1 , . . . , d and set 

L t (x) = a l l(x)D t D, + b\(x)Di , (3.2) 

L*(x)u t (x) = DiDj^l 3 (x)u t (x)) - Di(b l t (x)u t (x)) 

= Dj(dt (x)DiU t (x) - b{(x)u t (x) + u t (x)D i a t :> (x)) , (3.3) 

K k t (x)u t (x) = (3 k t (x)u t (x) + a?(x)D iUt (x), (3.4) 
A k *(x)u t (x) = (3 k (x)u t (x) - Di(af(x)u t (x)) 

= -ai k (x)D iUt (x) + (I3 k (x) - D i af(x))u t (x) : (3.5) 

where t e [0, T], x G R d , k — 1, ...,di — d, and as above we use the 
summation convention. Observe that Lipschitz continuous functions 
have bounded generalized derivatives and by 
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we mean these derivatives. Obviously, the operator L defined by (13.21) 
is uniformly elliptic with constant of ellipticity 5. 

Finally, by T\ we denote the completion of o~{y s : s < t} with respect 
to P, T. 

Let us consider the following initial value problem 

dir t (x) = L* t (x)ft t (x) dt + A k t *(x)7r t (x)ty k t r dy r t , (3.6) 

TC (x) = 7r (x), 

where t G [0,T], x G M d , and 7ft (a;) = ftt(uj,x). Equation (13. 6p is called 
the Duncan-Mortensen-Zakai or just the Zakai equation. 

We understand this equation and the initial condition in the following 
sense. We are looking for a function 7f = 7ft (x) = 7ft(u;,x), u G Q, 
t G [0,T], x G M d , such that 

(i) For each (u,t), ftt(uj,x) is a generalized function on M. d , 

(ii) We have 7f G L P (Q x [0, T],V, W^), 

(iii) For each ip G C£°(R d ) with probability one for all t G [0,T] it 
holds that 

(7f t , (f) = (7r , v?) - / (a t j Di7t t -b^7r t + TTtDictf , Dj(p) dt 
Jo 

- /Vf ATTt + (A^f - ^)7f t , (S r (t, z*) + 6"(t, y f ) d<) • 
Jo 

(3.7) 

Observe that all expressions in (13. 7p are well defined due to the fact 
that the coefficients of 7f and of Z^7f are bounded and appropriately 
measurable and 7f , G L P (Q x [0, T], V, L p ). 

Hence, equation (13. 6p has the same form as (II. ip and the existence 
and uniqueness part of Lemma I3TT1 below follow from Theorem 12. II The 
second assertion of the lemma follows from Theorem 12.31 

Lemma 3.1. There exists a unique solution ft of (13.61) with initial 
condition tt in the sense explained above. In addition, n t > for all 
t G [0,T] (a.s.). 

Here is a basic result of filtering theory for partially observable dif- 
fusion processes. Its relation to the previously known ones is discussed 
above. 

Theorem 3.2. Let ft be the function from Lemma \3. 11 Then 



< / 7Tt(x) dx = {jt, 1) < oo 



(3.8) 
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for all t G [0,T] (a.s.) and for any t G [0,T] and real-valued, bounded 
or nonnegative, (Borel) measurable function f given on W d 

E[f(x t )\^]= { p^- (a.s.). (3.9) 
fa, 1) 

Equation (13. 9p shows (by definition) that 

vrt x := —— - 
fa, l ) 

is a conditional density of distribution of x t given y s , s < t. Since, 
generally, fa, 1) ^ 1, one calls 7t t an unnormalized conditional density 
of distribution of x t given y s ,s <t. 

We derive Theorem 13 . 21 from Theorem 12 .21 and the result of [UJ where 
more regularity on the coefficients is assumed. 

The following is a direct corollary of embedding theorems from [5]. 

Theorem 3.3. Let tiq be a nonrandom function and ttq G Wp 2 ^ p for 

all p > 2, which happens for instance, if ttq is a Lipschitz continuous 
function with compact support. Then for any e e (0,1/2) almost surely 
7tt(x) is 1/2 — e Holder continuous in t with a constant independent of 
x, if t (x) is 1 — e Holder continuous in x with a constant independent 
oft, and the above mentioned (random) constants have all moments. 
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